feat(q4_0): first-class Q4_0 core format + scalar kernel + SPI#648
Merged
Merged
Conversation
Promotes Q4_0 (older GGML 4-bit, 18 bytes / 32 elements) from a JVM/MemSegment-only side-path to a first-class quantized format that any loader can produce and any backend can specialize, mirroring Q8_0: - commonMain `Q4_0TensorData` interface + `Q4_0BlockTensorData` (heap, ByteArray-backed) with `toFloatArray()` dequant and PackedBlockStorage. - `TensorEncoding.Q4_0` (32 elems / 18 bytes). - `Q4_0MatmulKernel` SPI + `KernelProvider.matmulQ4_0()` (default null) and a `"Q4_0"` case in `supports()`. - `ScalarQ4_0MatmulKernel` (portable commonMain floor) wired through `ScalarKernelProvider`. - `DefaultCpuOpsJvm`: lazy `q4_0MatmulKernel` resolved via KernelRegistry + an `is Q4_0TensorData ->` branch in `chooseQuantizedMatmul`. Uses the canonical ggml *split* nibble layout (low nibbles → elements 0..15, high → 16..31, `(code - 8) * d`) matching `DequantOps.dequantQ4_0FromBytes` — NOT the interleaved layout the existing JVM MemSeg `dotQ4_0BlockMemSeg` uses (that mismatch is the likely reason the Q4_0 MemSeg path was never exercised; PR2 reconciles it). Tests: Q4_0TensorDataTest (layout/dequant), Q4_0MatmulDispatchTest (scalar==dispatch), KernelProviderSupportsTest extended for Q4_0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
First of a stacked series promoting Q4_0 (older GGML 4-bit, 18 bytes / 32 elements) from a JVM/MemSegment-only side-path to a first-class quantized format — mirroring how Q8_0 is wired — so any loader can produce it and any backend can specialize it.
What's in this PR (Phase A, part 1)
Q4_0TensorDatainterface +Q4_0BlockTensorData(ByteArray-backed,PackedBlockStorage,toFloatArray()), plusTensorEncoding.Q4_0(32 elems / 18 bytes).Q4_0MatmulKernelinterface +KernelProvider.matmulQ4_0()(defaultnull) and a"Q4_0"case insupports().ScalarQ4_0MatmulKernel(portable commonMain floor) wired viaScalarKernelProvider.DefaultCpuOpsJvmlazyq4_0MatmulKernel(KernelRegistry) +is Q4_0TensorData ->branch inchooseQuantizedMatmul.Layout correctness note
Uses the canonical ggml split nibble layout (low nibbles → elements 0..15, high → 16..31;
(code - 8) * d) matchingDequantOps.dequantQ4_0FromBytes— not the interleaved layout the existing JVM MemSegdotQ4_0BlockMemSeguses. That mismatch is the likely reason the Q4_0 MemSeg path was never exercised; PR2 reconciles the MemSeg kernel to this layout.Tests
Q4_0TensorDataTest— pins split layout +(code-8)*scaledequant against the canonical ggml decode.Q4_0MatmulDispatchTest— dispatch routes through the kernel and matches the scalar reference (single/multi-batch, dim×dim).KernelProviderSupportsTest— extended for theQ4_0capability query.apiCheckgreen.Follow-ups (stacked)
PR2 Panama SIMD + MemSeg reconcile · PR3 Native FFM · PR4 FP32→Q4_0 quantizer + loader policy · PR5 docs. Targeting 0.27.0.
🤖 Generated with Claude Code